Advanced Machine Learning with scikit-learn

me : Andreas Mueller

API Review


In [1]:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split

X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

lr = LogisticRegression().fit(X_train, y_train)
print("predictions: %s" % lr.predict(X_test))
print("accuracy: %.2f" % lr.score(X_test, y_test))


predictions: [1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 0 1 0 0]
accuracy: 0.88

Grid-Search and Cross-Validation

Model Complexity


In [2]:
%matplotlib inline
from plot_forest import plot_forest_interactive
plot_forest_interactive()


None

Processing Pipelines

Real World Data


In [3]:
X = [{'age': 15.9, 'likes puppies': 'yes', 'location': 'Tokyo'},
     {'age': 21.5, 'likes puppies': 'no',  'location': 'New York'},
     {'age': 31.3, 'likes puppies': 'no',  'location': 'Paris'},
     {'age': 25.1, 'likes puppies': 'yes', 'location': 'New York'},
     {'age': 63.6, 'likes puppies': 'no',  'location': 'Tokyo'},
     {'age': 14.4, 'likes puppies': 'yes', 'location': 'Tokyo'}]

from sklearn.feature_extraction import DictVectorizer
vect = DictVectorizer(sparse=False).fit(X)
print(vect.transform(X))
print("feature names: %s" % vect.get_feature_names())


[[ 15.9   0.    1.    0.    0.    1. ]
 [ 21.5   1.    0.    1.    0.    0. ]
 [ 31.3   1.    0.    0.    1.    0. ]
 [ 25.1   0.    1.    1.    0.    0. ]
 [ 63.6   1.    0.    0.    0.    1. ]
 [ 14.4   0.    1.    0.    0.    1. ]]
feature names: ['age', 'likes puppies=no', 'likes puppies=yes', 'location=New York', 'location=Paris', 'location=Tokyo']

Text Processing and Classification

Learning on Large-Scale Datasets


In [ ]: